Experiments in Parallel Clustering with DBSCAN
نویسندگان
چکیده
We present a new result concerning the parallelisation of DBSCAN, a Data Mining algorithm for density-based spatial clustering. The overall structure of DBSCAN has been mapped to a skeletonstructured program that performs parallel exploration of each cluster. The approach is useful to improve performance on high-dimensional data, and is general w.r.t. the spatial index structure used. We report preliminary results of the application running on a Beowulf with good efficiency.
منابع مشابه
بررسی مشکلات الگوریتم خوشه بندی DBSCAN و مروری بر بهبودهای ارائهشده برای آن
Clustering is an important knowledge discovery technique in the database. Density-based clustering algorithms are one of the main methods for clustering in data mining. These algorithms have some special features including being independent from the shape of the clusters, highly understandable and ease of use. DBSCAN is a base algorithm for density-based clustering algorithms. DBSCAN is able to...
متن کاملClustering Research across Tibetan and Chinese Texts
Tibetan text clustering has potential in Tibetan information processing domain. In this paper, clustering research across Chinese and Tibetan texts is proposed to benefit Chinese and Tibetan machine translation and sentence alignment. A Tibetan and Chinese keyword table is the main way to implement the text clustering across these two languages. Improved Kmeans and improved density-based spatia...
متن کاملA Robust Density-Based Clustering Approach Using DBCURE –MapReduce Techniques
Clustering is the process of grouping similar data into clusters and dissimilar data into different clusters. Density-based clustering is a useful clustering approach such as DBSCAN and OPTICS. The increasing volume of data and varying size of data sets lead the clustering process challenging. So that we propose a parallel framework of clustering with advanced approach called MapReduce. We deve...
متن کاملThe DBSCAN Clustering Algorithm by a P System with Active Membranes
The great characteristic of the P system with active membranes is that not only the objects evolve but also the membrane structure. Using the possibility to change membrane structure, it can be used in a parallel computation for solving clustering problems. In this paper a P system with active membranes for solving DBSCAN clustering problems is proposed. This new model of P system can reduce th...
متن کاملSurvey and Performance Evaluation of DBSCAN Spatial Clustering Implementations for Big Data and High-Performance Computing Paradigms
Big data is often mined using clustering algorithms. Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is a popular spatial clustering algorithm. However, it is computationally expensive and thus for clustering big data, parallel processing is required. The two prevalent paradigms for parallel processing are High-Performance Computing (HPC) based on Message Passing Interface ...
متن کامل